9 - Random variables [ID:15862]

50 von 182 angezeigt

Hi, it's time to look at probability theory again and we're going to start with random

variables, continuous random variables. So the things we're going to talk about today

is usually something that is spread out over the course of a few lectures in a classic

probability course. We will review those topics and my goal here is to give you intuition

if you haven't seen that before or a refresher if you have seen those results before. So

everything today will be about random variables in one dimensional R. Those things generalize

to high dimensional variables but we'll draw everything in 1D. So let's start. We call

X a random variable if it's a mapping from some abstract probability space omega sigma

algebra A and probability measure P to R. So X has values in R. And I'd like you to

forget almost everything about that probability space. This is an important theory but it

will not matter that much in this course. So if you haven't seen sigma algebra before

then don't worry too much about that. And a random variable has a probability density

function PDF, so the abbreviation for that, rho X and the cumulative distribution function

which is CDF FX. And those two are connected via this formula. Probability that X takes

on values less than or equal than R is the CDF at R and can also be evaluated by integrating

the density from minus infinity to R. So let's look at that density. So this is the density

rho X of X, one dimension X and let's say this is R. Then the area underneath the curve

to the left of this value of R, this is the probability that X takes on values less or

equal than R. We will only rarely use this CDF and we'll mostly look at the PDF rho

X but sometimes we have to look at those events and then the CDF will come in handy. But in

general you should think about random variables in terms of the density this rho of X. For

more general events X and A, we can also integrate the density over A. So let's again draw something

like that. So A could also be a union of intervals or something more complicated. And then probability

that X is in those two sets, this is A, is then the area under the graph of the density.

That's probability. So one thing obviously you have to have is that the density rho X

is integrated to one over the whole domain. So that's something that has to hold. So integral

over all of R of the density dx is equal to one. The expectation of a random variable

is given by this quantity. It is essentially just integration over the density but we have

this this factor of X here. And in my opinion the best way to think about why this is correct

is the following. So let's approximate that by a discrete sum. So let's say this is roughly

sum of Xk times rho X of Xk times delta Xk of course. Now let's think about that. So

what does that mean? Let's say this is the density, maybe here is zero. What that means

is the expectation is summing up all position vectors. So here is Xk and actually this vector

is the vector Xk. And we sum up all those Xk for various values of k. So this is Xl

and we weight this vector by the height of the density at that point. So what that means

is this position vector here, this Xk gets a really strong contribution. This position

vector here gets slightly smaller contribution. And if we take something far out here, then

this gets only a very weak contribution. So this is the center of mass in some sense.

We take an average, this sums up to one, so it's a weighted average of position vectors

and points with a higher density are weighted more strongly. So this is a really good picture

for the expectation. It's essentially a center of mass or an average, a mean where the probability

mass is concentrated in average. And the variance is a measure for how far it's spread out.

So we do something similar here. We just measure deviations from the mean and weight it by

the probability density at that point. But I'm sure you have seen mean and variance at

some point before. So let's continue. When we do Bayesian inference, we often have two

different things. So maybe a parameter and observable. So we will definitely have to

work with two random variables at once. So we will have to have a concept of two variables

being jointly distributed. And if let's look at this formula first. So the probability

that X and Y are in some two dimensional set A is the integral of the joint density over

this domain A, which makes sense because that's 2D generalization of what we just said in

Teil einer Videoserie :

Mathematical Data Science 1

Presenters

Dr. Philipp Wacker

Zugänglich über

Offener Zugang

Dauer

00:27:56 Min

Aufnahmedatum

2020-05-14

Hochgeladen am

2020-05-14 23:36:20

Sprache

en-US

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/15862

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/15862&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Per RSS abonnieren